深度神经网络是当今离线增强学习中最常用的功能近似值。先前的工作表明,接受TD学习和梯度下降训练的神经网可以表现出隐式正则化,可以通过这些网络的参数化不足来表征。具体而言,已经观察到在训练期间,倒数第二个特征层的排名(也称为\ textit {有效等级})急剧崩溃。反过来,这种崩溃被认为是为了降低模型在学习后期进一步适应的能力,从而导致最终表现降低。有效等级和绩效之间的这种关联使离线RL的有效等级引人注目,主要用于离线政策评估。在这项工作中,我们对三个离线RL数据集的有效等级与绩效之间的关系进行了仔细的实证研究:Bsuite,Atari和DeepMind Lab。我们观察到,直接关联仅存在于受限的设置中,并且在更广泛的超参数扫描中消失。此外,我们从经验上确定了三个学习的阶段,这些阶段解释了隐式正则化对学习动力学的影响,并发现单独进行引导不足以解释有效等级的崩溃。此外,我们表明其他几个因素可能会混淆有效的等级与绩效之间的关系,并得出结论,在简单假设下研究这种关联可能会产生高度误导。
translated by 谷歌翻译
我们确定和研究政策流失的现象,即基于价值的强化学习中贪婪政策的快速变化。政策流失以惊人的快速步伐运作,改变了少数学习更新(在Atari上的DQN等典型的深层RL设置中)中大量州的贪婪行动。我们从经验上表征了现象,验证它不限于特定算法或环境特性。许多消融有助于削弱关于为什么流失仅与深度学习有关的少数相关的合理解释。最后,我们假设政策流失是一种有益但被忽视的隐性探索形式,它以新鲜的方式铸造了$ \ epsilon $ greedy探索,即$ \ epsilon $ - noise的作用比预期的要小得多。
translated by 谷歌翻译
The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.
translated by 谷歌翻译
We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across states. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into exploration bonuses and obtain significantly improved exploration in a number of hard games, including the infamously difficult MONTEZUMA'S REVENGE.
translated by 谷歌翻译
In this paper, we present a modular methodology that combines state-of-the-art methods in (stochastic) machine learning with traditional methods in rule learning to provide efficient and scalable algorithms for the classification of vast data sets, while remaining explainable. Apart from evaluating our approach on the common large scale data sets MNIST, Fashion-MNIST and IMDB, we present novel results on explainable classifications of dental bills. The latter case study stems from an industrial collaboration with Allianz Private Krankenversicherungs-Aktiengesellschaft which is an insurance company offering diverse services in Germany.
translated by 谷歌翻译
Key Point Analysis(KPA) is a relatively new task in NLP that combines summarization and classification by extracting argumentative key points (KPs) for a topic from a collection of texts and categorizing their closeness to the different arguments. In our work, we focus on the legal domain and develop methods that identify and extract KPs from premises derived from texts of judgments. The first method is an adaptation to an existing state-of-the-art method, and the two others are new methods that we developed from scratch. We present our methods and examples of their outputs, as well a comparison between them. The full evaluation of our results is done in the matching task -- match between the generated KPs to arguments (premises).
translated by 谷歌翻译
We show how the inherent, but often neglected, properties of large-scale LiDAR point clouds can be exploited for effective self-supervised representation learning. To this end, we design a highly data-efficient feature pre-training backbone that significantly reduces the amount of tedious 3D annotations to train state-of-the-art object detectors. In particular, we propose a Masked AutoEncoder (MAELi) that intuitively utilizes the sparsity of the LiDAR point clouds in both, the encoder and the decoder, during reconstruction. This results in more expressive and useful features, directly applicable to downstream perception tasks, such as 3D object detection for autonomous driving. In a novel reconstruction scheme, MAELi distinguishes between free and occluded space and leverages a new masking strategy which targets the LiDAR's inherent spherical projection. To demonstrate the potential of MAELi, we pre-train one of the most widespread 3D backbones, in an end-to-end fashion and show the merit of our fully unsupervised pre-trained features on several 3D object detection architectures. Given only a tiny fraction of labeled frames to fine-tune such detectors, we achieve significant performance improvements. For example, with only $\sim800$ labeled frames, MAELi features improve a SECOND model by +10.09APH/LEVEL 2 on Waymo Vehicles.
translated by 谷歌翻译
Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL-E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of LidarCLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging detection scenarios under adverse sensor conditions. We also use LidarCLIP as a tool to investigate fundamental lidar capabilities through natural language. Finally, we leverage our compatibility with CLIP to explore a range of applications, such as point cloud captioning and lidar-to-image generation, without any additional training. We hope LidarCLIP can inspire future work to dive deeper into connections between text and point cloud understanding. Code and trained models available at https://github.com/atonderski/lidarclip.
translated by 谷歌翻译
The quality of consequences in a decision making problem under (severe) uncertainty must often be compared among different targets (goals, objectives) simultaneously. In addition, the evaluations of a consequence's performance under the various targets often differ in their scale of measurement, classically being either purely ordinal or perfectly cardinal. In this paper, we transfer recent developments from abstract decision theory with incomplete preferential and probabilistic information to this multi-target setting and show how -- by exploiting the (potentially) partial cardinal and partial probabilistic information -- more informative orders for comparing decisions can be given than the Pareto order. We discuss some interesting properties of the proposed orders between decision options and show how they can be concretely computed by linear optimization. We conclude the paper by demonstrating our framework in an artificial (but quite real-world) example in the context of comparing algorithms under different performance measures.
translated by 谷歌翻译
The last decade has witnessed the breakthrough of deep neural networks (DNNs) in many fields. With the increasing depth of DNNs, hundreds of millions of multiply-and-accumulate (MAC) operations need to be executed. To accelerate such operations efficiently, analog in-memory computing platforms based on emerging devices, e.g., resistive RAM (RRAM), have been introduced. These acceleration platforms rely on analog properties of the devices and thus suffer from process variations and noise. Consequently, weights in neural networks configured into these platforms can deviate from the expected values, which may lead to feature errors and a significant degradation of inference accuracy. To address this issue, in this paper, we propose a framework to enhance the robustness of neural networks under variations and noise. First, a modified Lipschitz constant regularization is proposed during neural network training to suppress the amplification of errors propagated through network layers. Afterwards, error compensation is introduced at necessary locations determined by reinforcement learning to rescue the feature maps with remaining errors. Experimental results demonstrate that inference accuracy of neural networks can be recovered from as low as 1.69% under variations and noise back to more than 95% of their original accuracy, while the training and hardware cost are negligible.
translated by 谷歌翻译